The role of fine-grained annotations in supervised recognition of risk factors for heart disease from EHRs

نویسندگان

  • Kirk Roberts
  • Sonya E. Shooshan
  • Laritza Rodriguez
  • Swapna Abhyankar
  • Halil Kilicoglu
  • Dina Demner-Fushman
چکیده

This paper describes a supervised machine learning approach for identifying heart disease risk factors in clinical text, and assessing the impact of annotation granularity and quality on the system's ability to recognize these risk factors. We utilize a series of support vector machine models in conjunction with manually built lexicons to classify triggers specific to each risk factor. The features used for classification were quite simple, utilizing only lexical information and ignoring higher-level linguistic information such as syntax and semantics. Instead, we incorporated high-quality data to train the models by annotating additional information on top of a standard corpus. Despite the relative simplicity of the system, it achieves the highest scores (micro- and macro-F1, and micro- and macro-recall) out of the 20 participants in the 2014 i2b2/UTHealth Shared Task. This system obtains a micro- (macro-) precision of 0.8951 (0.8965), recall of 0.9625 (0.9611), and F1-measure of 0.9276 (0.9277). Additionally, we perform a series of experiments to assess the value of the annotated data we created. These experiments show how manually-labeled negative annotations can improve information extraction performance, demonstrating the importance of high-quality, fine-grained natural language annotations.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

THE ROLE OF BEHAVIOR PATTERN AND EMOTIONAL RISK FACTORS IN CORONARY HEART DISEASE

For evaluating the role of behavior pattern and emotional factors in coronary heart disease (CHD), 86 patients were followed for one to three years (average 20 months). The behavior pattern itself was not considered as a main risk factor, rather it was found to be an aggravating and predisposing factor, especially in morbidity rate. Both behavior patterns were more common in males than in ...

متن کامل

Prevalence of major coronary heart disease risk factors in Iran

Background and aims: Coronary heart diseases (CHDs) contribute to mortality, morbidity, disability, productivity and quality of life. This study was aimed to determine the prevalence of major risk factors for CHD in the provinces of Iran. Methods: This study reported pre-existing data and was of secondary, descriptive type. Prevalence of non-communicable disease (NCD) risk factors was def...

متن کامل

Weakly-supervised Discriminative Patch Learning via CNN for Fine-grained Recognition

Research on fine-grained recognition has recently shifted from multistage frameworks to convolutional neural networks (CNN) that are trained end-to-end. Many previous end-to-end deep approaches typically consist of a recognition network and an auxiliary localization network trained with additional part annotations to detect semantic parts shared across classes. To avoid the cost of extra semant...

متن کامل

Weakly Supervised Fine-Grained Image Categorization

In this paper, we categorize fine-grained images without using any object / part annotation neither in the training nor in the testing stage, a step towards making it suitable for deployments. Fine-grained image categorization aims to classify objects with subtle distinctions. Most existing works heavily rely on object / part detectors to build the correspondence between object parts by using o...

متن کامل

Investigating the awareness of inter-city bus drivers and truck drivers on coronary heart diseases risk factors

Introduction: In recent years, cardiovascular risk factors and the role of knowledge was so important in new researches. Unfortunately, ascending trend of these diseases in developing countries as well as Iran is disquieting. This study is aimed to determine the knowledge of truck and bus drivers about coronary heart disease risk factors. Methods: In this descriptive-analytic study, 300 b...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Journal of biomedical informatics

دوره 58 Suppl  شماره 

صفحات  -

تاریخ انتشار 2015